Mass Difference Matching Unfolds Hidden Molecular Structures of Dissolved Organic Matter

Ultrahigh-resolution Fourier transform mass spectrometry (FTMS) has revealed unprecedented details of natural complex mixtures such as dissolved organic matter (DOM) on a molecular formula level, but we lack approaches to access the underlying structural complexity. We here explore the hypothesis that every DOM precursor ion is potentially linked with all emerging product ions in FTMS2 experiments. The resulting mass difference (Δm) matrix is deconvoluted to isolate individual precursor ion Δm profiles and matched with structural information, which was derived from 42 Δm features from 14 in-house reference compounds and a global set of 11 477 Δm features with assigned structure specificities, using a dataset of ∼18 000 unique structures. We show that Δm matching is highly sensitive in predicting potential precursor ion identities in terms of molecular and structural composition. Additionally, the approach identified unresolved precursor ions and missing elements in molecular formula annotation (P, Cl, F). Our study provides first results on how Δm matching refines structural annotations in van Krevelen space but simultaneously demonstrates the wide overlap between potential structural classes. We show that this effect is likely driven by chemodiversity and offers an explanation for the observed ubiquitous presence of molecules in the center of the van Krevelen space. Our promising first results suggest that Δm matching can both unfold the structural information encrypted in DOM and assess the quality of FTMS-derived molecular formulas of complex mixtures in general.


INTRODUCTION
Complex mixtures are key study objects in environmental and industrial applications, but their analysis remains challenging. 1−4 One of the most complex mixtures in natural ecosystems is dissolved organic matter (DOM). 5,6 DOM is a central intermediate of ecosystem metabolism and mirrors molecular imprints of interactions with its abiotic and biotic environment, 7−9 which form the basis for processes such as carbon sequestration and nutrient recycling. 10,11 Despite significant advances in ultrahigh-resolution mass spectrometry (FTMS) 2,4 and nuclear magnetic resonance spectroscopy, 12 scientists still struggle to decode this information on the molecular level, 13−17 and novel approaches to identify distinct structures are required to translate molecular-level information into improved process understanding.
Open and living systems promote the formation of ultracomplex mixtures of thousands to millions of individual constituents 18,19 that mirror large environmental gradients. 20−22 As a consequence, DOM poses significant challenges to separation, isolation, and structure elucidation. Direct infusion (DI) FTMS techniques have become indispensable tools for the molecular-level analysis of DOM as they reveal unprecedented details of molecular formulas using the exact mass (MS 1 data, m/z) even without prior separation. 23 However, FTMS techniques are selective and do not resolve all structural detail observed at the exact mass in DOM, as the presence of isobars and isomers hinders the identification of particular structures from these molecular formulas. 19,23−25 Additionally, current structural databases cover only a small fraction of molecular formulas encountered and typically lead to annotation rates <5%. 18,26,27 One way to obtain structure information on isomers and isobars is through collision-induced dissociation (CID; MS 2 , or multistage MS n ). 27−29 The relatively wide isolation window (∼1 Da) of mass filters applied for precursor ion selection commonly hinders the isolation and subsequent fragmentation of single exact masses, leading to mixed "chimeric" MS 2 spectra of co-fragmented precursor ions. 30 Even though some authors achieved isolation of single masses or improved description of chimeric tandem MS data, fragmentation patterns were found to be universal across DOM samples. 18,19,31−35 Most of these studies, however, focused on the major product ion peaks (fragments), which usually make up only 60−70% of the total product ion abundance, and thus disregarded many lowabundance signals that may be more suitable to detecting structural differences. 19,31 The major product ions encountered in tandem mass spectra of DOM relate to sequential neutral losses of common small building blocks, mainly CO 2 , H 2 O, or CO units. 14,33 A mass difference between a precursor and a product ion in an MS 2 spectrum is herein called "delta mass" and referred to as Δm (plural Δm's). Common Δm's such as CO 2 or H 2 O are deemed nonindicative for the identification of structural units. 18,28,31,33,36 In contrast, other studies found recurring low-m/z product ions (e.g., at m/z 95,97,109,111,123,125,137,139,151, and 153) that were interpreted as a limited set of core structural units substituted with a set of functional groups, yet in different amounts and configurational types that would lead to highly diverse mixtures. 37−44 From a stochastic standpoint, the occurrence of common neutral losses may not be surprising; many structures contain hydroxyl groups that could yield H 2 O losses, and CO 2 could originate from ubiquitous carboxyl groups. 45 In contrast, the occurrence of two molecules sharing a larger substructure would be less probable and less easily detected as a major peak. Signatures of DOM's structural diversity could thus prevail in the high number of low-abundance fragments usually detected below m/z 200−300, as opposed to the higher abundance of fragments connected to losses of CO 2 or H 2 O. Given the large number of estimated isomers and isobars underlying usual DOM data, 18,19,31,32,39,45−48 we here build upon the hypothesis that every co-fragmented precursor ion potentially contributes to every emerging product ion signal. We interpret the resulting chimeric MS 2 data as a structural fingerprint that can be deconvoluted to obtain individual precursor ion Δm matching profiles. The analysis of Δm's that link precursor and product ions is independent of the masses of the unknown precursor ions and known reference compounds in databases of annotated Δm features, and therefore does not rely on indicative product ions (fragments) alone. Although this approach sacrifices the identification of true knowns, it allows for the identification of potential structural analogues via indicative Δm's and may be especially suited when annotation rates are as low as in the case of DOM, i.e., when most compounds are unknown. 18,26,27 Despite the unknown identity of most of the molecules present in DOM, its potential sources can be constrained reasonably well. Plants produce most of the organic matter that sustains food webs in natural ecosystems. Plant metabolites such as polyphenols thus represent a major source of DOM. Therefore, an early decomposition phase likely exists when the imprint of soluble/solubilized plant metabolites is still detectable by MS 2 experiments using current FTMS technology: Lignin-related compounds show indicative methoxyl/methyl radical losses, 18,49,50 glycosides indicate a sugar loss, 51,52 and hydrolyzable tannins may lose galloyl units. 52 Mass differences related to atoms such as N, S, P, Cl, Br, I, and F could also help to identify unknown organic nutrient species or disinfection byproducts, thereby widening the applicability of the approach. 1,53 Finally, indicative Δm fingerprints could provide constraints to putative compound group annotations derived from molecular formula data alone (van Krevelen diagrams) or allow for a more precise annotation. 54−56 We hypothesized that DOM from swamps and topsoil, in close contact with plant inputs and active microbial communities, would reflect recognizable plant-related source imprints that can be revealed by tandem mass spectrometry. Specifically, we explored links between precursor ion Δm matching profiles and precursor ion characteristics such as nominal mass, mass defect, initial ion abundance, fragmentation sensitivity, oxygen-to hydrogen ratio (O/C), heteroatom content, and structure suggestions. These properties are in part predictable from the assigned molecular formula, and thus allow for an evaluation of the approach ("proof of concept") while also revealing potential nonassigned molecules (e.g., P-, Cl-, Br-, I-, and F-containing molecular formulas). Finally, we hypothesized that indicative Δm features of plant phenols, e.g., lignin-and tannin-related losses, would match their yet unknown structural analogues in DOM and that these patterns would reflect commonly applied "structural domain" distributions. 55,57,58

EXPERIMENTAL SECTION
A detailed experimental procedure is provided in the Supporting Information of this article (Note S-1). In short, we chose 14 aromatic reference compounds as representative plant metabolites (Figure S-1 and Table S-1) and a forest topsoil pore water isolate 59 and Suwannee River Natural Organic Matter (SRNOM) 60 as exemplary DOM samples. All reference and sample solutions were directly infused into the ESI (electrospray) source of an Orbitrap Elite (Thermo Fisher Scientific, Bremen) at negative ionization mode (Table S-2) and fragmented by collision-induced dissociation (CID, MS 2 ). We chose four nominal masses within the mass range typically observed in terrestrial DOM samples (m/z 200−500) for fragmentation (m/z 241, 301, 361, and 417, herein referred to as isolated precursor ion mixtures, "IPIMs") to test the approach. 61 Soil DOM was analyzed at three normalized collision energy (NCE) levels (15, 20, and 25%). MS 3 spectra of selected key product ions (aglycons of flavonoids and demethylated dimethoxy-methyl-benzoquinone) were acquired at NCE 20 or 25. After recalibration with known (Table S-3) or predicted product ions (losses of CO 2 , H 2 O, etc.), all major product ions were annotated with a molecular formula in reference compounds (Figure S-2, Tables S-4, and Table S-5) and DOM. Formula annotation was conducted with a Matlab routine recently incorporated into an open FTMS data processing pipeline. 62 For MS 2 data analysis, we generated Δm matrices of every pairwise combination of precursor and product ions ("Δm fingerprints"). Every value in this matrix is referred to as a Δm feature or Δm. We compared the unknown Δm features in DOM to three lists of known Δm features: (a) 54 Δm features ubiquitously found in DOM (Table S- The detection of a known Δm feature in DOM is herein called "Δm matching" and detected Δm features are called Δm matches. Matching was conducted at a mass tolerance of ± 0.0002 Da (2 ppm at 200 Da). The array of Δm matches of a single precursor ion is called the Δm matching prof ile, and all precursor ion profiles of an IPIM form the subset of matched Δm's of the Δm matrix introduced above. The decomposition of the MS 2 spectrum into a Δm matrix and therefore, individual Δm matching profiles is what we define as the deconvolution step in this study. Δm's of lists (a) and (b) showed some overlap and were largely part of list (c) as well. The specificity of any Δm feature in list (c) was checked by their association to compound classes as defined by Classy-Fire. 66 The top 15 significantly associated classes were then obtained for each Δm feature in list (c) and included in analyses using the reference-compound-derived list (list b) as well.
We assessed the probability of false-positive matches and accounted for the number of elements in the formula, ion abundance, and measures of fragmentation sensitivity to validate our approach. The matching data were combined for each NCE level and transformed into a binary format. We classified Δm matching profiles of DOM precursor ions and reference compounds of lists (b) and (c) by two-way hierarchical clustering using Ward's method and Euclidean distance, as well as Principal Components Analysis (PCA) in PAST (v3.10) for list b. 67 We visualized numbers of individual Δm matches and Δm cluster matches in van Krevelen space for all lists. We chose the structural domains reprinted in the 2014 review by Minor et al. for reference because this represents the general level of detail and type of classes distinguished in recent DOM studies ( Figure S-4). 57,58,68−70 In two separate analyses, formulas were also classified with a more general and a data-based van Krevelen scheme besides the reference one. 58,71 Finally, we assessed the agreement between structures predicted by Δm matching and those suggested in natural product structural databases. We combined structure suggestions from different databases, including Dictionary of Natural Products, 72 KNApSAcK, 73 Metacyc, 74 KEGG, 75 and HMDB, 76 as well as their expanded in silico annotations based on predicted enzymatic transformations in the MINEs database. 77 Although the MINEs database covers 198 generalized chemical reaction rules it may not include all potential environmental reactions because those are not solely driven by enzymes. The InChi-Key of structures was used to exclude stereoisomers and classify suggested structures into compound classes by ClassyFire. 66

Fragmentation
Behavior of Soil DOM. DOM precursor ions were isolated and fragmented to obtain Δm data via matching (Figure S-5). To find the best collision energy to fragment DOM, we analyzed soil DOM at three NCE levels (15, 20, and 25). All IPIMs showed similar fragmentation properties (Note S-3 and Table S  ; see legends in every plot. Colored boxes in indicative VK plots mark the expected structural region of formulas that would be expected to yield the respective Δm, and colors refer to the structural motifs marked in (m−q). Phenylpropanoid or benzenoid-like (sub-)structures as the ones shown in empty circles (o, p, q) may also contain methyl or methoxy groups (filled orange dots in m, n) that could produce methyl radical losses. Calculations based on Δm data are presented in more detail in Table S-13. Highlighted red open diamonds in (e) and (k) indicate loss of up to three gallic acid equivalents (size not drawn to scale).
Product ion spectra did not indicate abrupt structural changes upon increasing NCE, showing no separation of isomers/ isobars but a continuous increase in fragmentation across all precursor ions. Based on the above results, NCE of 25 was chosen to fragment SRNOM for comparison.
Despite common differences between precursor ion abundance and O/C ratio or mass defect (Figure 1a,d), we found a significant positive link between both metrics and fragmentation sensitivity independent of nominal mass, ranging from half-life NCE (i.e., the NCE level causing 50% decrease in ion abundance) of 10−35 under our instrumental settings (calculated from linear fits). Remarkably, this trend was not observed in reference compounds (Figure 1b,e). Such a discrepancy has been observed also by Zark et al. for the common CO 2 loss and was interpreted as a result of intrinsic averaging. 31,45 In contrast, Dit Foque et al. described the potential separation of less complex isomer mixtures by ramped fragmentation. 29 Bearing the limitation in mind that we only analyzed four IPIMs here, our results support the intrinsic averaging hypothesis and indicate that fragmentation sensitivity may be an additional property shaped by DOM complexity. 18,20,45 It also supports our assumption of a high number of isomers and isobars "hidden" beneath each precursor ion molecular formula, which also increases the probability to detect meaningful links between precursor and product ions. A minor group of oxygen-poor formulas was nonresponsive (Note S-3). Matching to list c showed no significant relation to O/C ratio but to mass defect (Figure 1c, f). In contrast to mass defect, initial ion abundance showed no link to fragmentation sensitivity but was significantly correlated to higher numbers of Δm matches (r = 0.41, R 2 = 0.17, n = 157, p < 0.001; see also Tables S-9, S-10, S-11, and S-12, and Δm matching seem to be independent DOM precursor ion properties and that Δm matching could be driven by ion abundance. SRNOM and the soil water sample shared most molecular formulas (n = 107; 84% of soil DOM and 74% of SRNOM formulas) and accounted for most of precursor ion abundance at NCE 25 (96.5% and 97.2%, respectively). Despite this high similarity, SRNOM precursor ions showed higher numbers of Δm matches (Figure 1c,f), which could indicate that the same molecular formula is more chemodiverse, i.e., has more underlying structural formulas in SRNOM compared to soil DOM (Section 3.5).
3.3. Evaluation of the Δm Matching Approach. We used the matching data of molecular formulas in DOM for a proof-of-concept evaluation of our Δm matching approach. Specifically, we aimed to test the hypothesis that all precursor ions are potentially linked to all product ions in chimeric MS 2 spectra of ultracomplex DOM. Our analysis was congruent with previous observations, showing losses of common Δm's (Table S- Our evaluation also shows that Δm matching not only helps in recalibration 79 but also serves to check formula annotation, as it revealed unresolved precursor  Figure S1; #12* and #13* refer to MS3 spectra of flavonoid aglycons. Numbers are coverage in Δm matches compared to overall Δm's per Δm cluster; values > 20% are highlighted in bold, values <10% are grayed out. Δm clusters are shown in rows ("Cl. #", 1−7) and precursor clusters in columns (A−H, for details, see Table S14 and original clustering data in PANGAEA datasets). Additional columns show respective numbers of Δm matches ("n") and assigned cluster name (compare Table S14). In the lower row, numbers of precursors per precursor cluster are given for both samples combined and individually. Few reference compounds clustered with precursor clusters D−H, which were dominated by DOM precursors with higher numbers of Δm matches. Compounds #7, #12, #13, and #14 contain polyol moieties; compounds #1, #4, #5, and #6 contain -methoxy and -methyl moieties ( Figure S1) Environmental Science & Technology pubs.acs.org/est Article ion compositions interfering especially with CHOS precursor ions (related to Cl, P, and F). This means that (1) these atoms should be included for better coverage of elemental composition (i.e., prioritization) in our specific sample context and that (2) higher resolution power may be required to resolve S-, Cl-, P-, or F-containing precursor ion compositions. 1 In summary, Δm matching revealed an inherently structured biogeochemical signal of precursor ions that seem to fragment individually and was highly sensitive in detecting precursor-product ion pairs. This suggests that chimeric DOM data can be deconvoluted to reveal differences in molecular composition not visible from MS 1 inspection. 23,80 It should be stressed that these results will need further evaluation due to the small number of DOM precursor ions, m/z values, and samples analyzed here (159 in soil DOM, 221 in SRNOM), and that deconvolution should be further tested with bettercharacterized mixtures, including, e.g., structural analogues, artificial mixtures, or standard additions (spiking). 14 (Table S-14), i.e., matching of precursor ions across Δm features and vice versa. In the following, precursor ion clusters will be referred to by letters (A−H) and Δm clusters by numbers (1−7; Table 1). Based on the specificity of SIRIUS Δm features (Table S-14), we defined five Δm clusters found herein as structure-specific (Table 1, Figure 2d,e,j,k; and Table S-13).

CO and CH 3
• were also annotated to benzenoid structures via SIRIUS (Table S- 14). The methyl radical loss is an expected diagnostic Δm of methoxylated aromatic rings as in lignin (orange square in Figure 2f, l; orange circles in panels m, n; see Note S-5), but was also matched to DOM precursor ions not classified as "lignin-like". 18,31,35,49 The Δm features CH 3 • , CO and C 2 H 4 were also linked to CH 4 vs O series which describe regular 0.0364 Da increments in DOM that are formally annotated by Taken together, matching to Δm features derived from a small set of reference compounds revealed emerging clusters of precursor ion and Δm feature families that may prove more indicative if constrained with further DOM and reference compound data. 14 Anticipated structural domains were apparent but indicated clear overlap, which means that the same precursor ion was part of more than one Δm-predicted structural class. For example, 27 classic lignin-like precursor ions were part of seven precursor ion clusters (B−H; Table S These represented on average 79 ± 19% (63 ± 31% in SRNOM) of all Δm matches per CHOS precursor ion or 91 ± 7% (79 ± 28%) of all CHNO precursor ion matches (Note S-6). CHNO precursor ions were annotated with reduced forms of N (including aralkylamines, amino acids, carboximidamides, and dicarboximides/urea-containing compounds, Table S-20) but not to nitrate esters. 34,84,85 S-containing Δm matches indicated the potential presence of sulfonic, thiol, thioether, or aromatic CHOS compounds. 86 These results show a wide potential diversity of N and S compounds in DOM that differs from earlier reports of mainly aromatic N and sulfonic S. 34,87,88 As most of these studies analyzed marine DOM, the detection of more diverse sets of CHOS and CHNO precursor ions could relate to the terrestrial, less degraded DOM analyzed here. 16,89−91 Further tests with N-and S-containing reference compounds and DOM samples are warranted to reveal the hidden diversity of CHNO/CHOS compounds and confirm potential structures, e.g., by NMR.
All in all, our results show that it may be possible to refine molecular structure representations in van Krevelen plots by deconvoluted MS 2 data and that complementary precursor ion information could be used to assess false or biased Δm-based class assignments (e.g., elemental composition, DBE, ionization, fragmentation sensitivity, ion mobility, polarity index, etc.). 13,55,58 Fluorescence or NMR spectroscopy could add valuable information if DOM would be fractionated before MS 2 data acquisition, i.e., to assess indirect (statistical) links of MS 2 features with complementary forms of structural insight. 21,92−94 Our findings must however be taken with caution for four reasons: (1) SIRIUS Δm features (list c) were not obtained on the same instrument and thus may include features that, although correlated with certain compound classes, may not appear in DOM under the same instrumental settings.
(2) SIRIUS Δm features may be biased toward certain classes of compounds ( Figure S-3), as our set of 14 aromatic compounds. Here, we only considered negative ESI mode data which is commonly employed for DOM analysis. Adding positive ESI or other ionizations would extend the range of Δm features and structural classes   14,16,23,86 The same applies to other fragmentation techniques than CID.
(3) Product ion abundance was disregarded in our analysis, but could be used to weigh probabilities of potential precursor-product ion pairs in future, potentially in combination with fragmentation energy gradients (fragmentation trees), 95 moving m/z isolation windows, or ion accumulation time variation. 96 (4) Despite a seemingly improved separation of extreme classes (high H/C ratios in fatty acids, high O/C ratios in carbohydrates, etc.), potential overlap in structural class boundaries remained considerable (Figures 3 and  S-12).
Data-dependent and data-independent acquisition (DDA, DIA) techniques could be used to cover the whole mass range of precursor ions in DOM mass spectra in future, and are widely employed in LC-MS of complex mixtures. 16,27,97,98 For example, Ludwig et al. presented a DIA scheme (SWATH-MS) that employs one precursor ion scan and 32 isolation windows of 25 Da width, covering 800 Da within 3.3 s; similar schemes are likely transferable to acquire full mass range data of directly injected DOM. 99 Kurek et al. recently presented such data (m/ z 392−408), 16 Leyva et al. discerned fragmentation pathways and structural families (mass range m/z 261−477). 14 The latter approach could be extended to include the diversity of structure-associated Δm features presented here. Together, this shows that practicable tandem MS acquisition strategies are in reach and will enable deeper analyses of Δm features in DOM soon.

Drivers of Differences in Δm Matching between
Soil DOM and SRNOM. Although matching among the two samples was largely consistent, slight differences were apparent in van Krevelen distributions (list b: Figure 2, list c: Figures 3  and S-12). We therefore tested the separation of precursor ion clusters by ordination (principal component analysis, Figure 4) using list b. Precursor ion clusters were clearly separated on Principal Components 1 and 2 which together held about 47% of variation. Most considered precursor ions were shared among samples (64%, 38 out of 59), only a small number was sample-specific (SRNOM = 14, Soil DOM = 7). Samplespecific precursor ions were found in clusters A (linked to carboxylic acids), B (phenols, polyols) and C (benzenoids ,  Table 1), the remaining clusters D−H were dominated by the shared precursor ions. Out of the 38 shared precursor ions, 30 (79%) grouped in the same precursor ion cluster despite a general trend to higher numbers of matches in SRNOM, but eight grouped differently (bold precursor ions in Figure 4). These differences in matching could be related to different chemistries, i.e., different isomeric/isobaric composition. 84 For example, the cluster "switch" in C 11 H 14 O 6 was largely explained by higher ion abundance and Δm matches in SRNOM, while in C 23 H 22 O 4 , the effect could be partly linked to higher fragmentation resistance in SRNOM (Table S-21). Unfortunately, we only have data on initial ion abundance and fragmentation sensitivity from the soil DOM isolate; other precursor ion properties, however, showed very similar trends in both samples (Table S-21).
Similar clustering and Δm-predicted structural classes ( Figure S-12) in shared precursor ions could indicate a conserved structural composition. Likewise, Kurek et al. Environmental Science & Technology pubs.acs.org/est Article observed high similarity in photoionized (APPI) and IMPRDfragmented DOM samples but observed clear differences in CHOS fragmentation. 16 High similarities between DOM samples would be in line with stoichiometric principles (i.e., due to a large share in precursor ions between DOM samples) and could suggest that DOM processing diversifies, but also "randomizes" the molecular composition of each precursor ion ("universal" signal). 31,100,101 High congruence of fragmentation patterns (and thus, Δm matching) among DOM precursor ions has also been interpreted as a sign of similarly substituted but slightly differing core structures. 35,37 The clusters devised here were small due to the relatively small number of precursor ions and m/z values analyzed, and thus may not detect significant differences between samples yet. However, even with our small set of precursor ions, the clustering by Δm matching showed conserved differences in fragmentation between precursor ion clusters, and in part, even the same precursor ion in different samples. The fact that this could relate to differences in ion abundance (and therefore, possibly also ionization efficiency) or fragmentation sensitivity is intriguing and should be investigated across a wider range of DOM chemotypes using improved classification approaches as applied here (see also Section 3.4). 14 In line with this, potential compositional differences between DOM samples became more apparent when more Δm features were used for the clustering (list c instead of list b; Figures 3 and S-12).

Ion Abundance Is Linked to Δm
Matching Frequency and Structural Diversity. Ion abundance was the most important driver for Δm matching in both samples and highest in the van Krevelen plot "region" usually assigned to ubiquitous lignin structures or carboxyl-rich aliphatic molecules. 59,83 This region also parallels with a maximum in potential underlying chemodiversity, 30,102 which could explain why these signals are ubiquitously found and especially dominant in recycled DOM. 90,103 Δm matching showed potential to reveal this underlying chemodiversity effect and was therefore compared to numbers of structure suggestions and Δm-predicted compound classes per precursor ion ( Figure  5). Numbers of Δm matches were significantly and positively related to the number of structure suggestions in absolute terms and for specific compound classes (Table S- 21). The correlation between Δm-predicted and suggested compound classes was surprisingly similar in both samples and significant for almost all benzenoid-type (benzopyrans, methoxybenzenes, anisoles, phenols, etc.) and most phenylpropanoid-type structures (flavonoids, linear 1,3-diarylpropanoids). Among the organic acids, only vinylogous acids stood out (i.e., containing carboxylic acid groups with insertions of C�C bond(s)). Significant correlations were also found for pyrans, acryloyl compounds, carbohydrates, aryl ketones, and alkyl aryl ethers (fatty acids and analogues only in SRNOM).
The positive link between ion abundance and numbers of Δm matches on the one hand and predicted and suggested structures on the other indicates that ion abundance may be linked to the number of structural isomers and isobars per molecular formula in FTMS spectra of DOM and explains why Δm-defined structural classes showed strong overlap in this study. It also provides additional support to our assumption that all precursor ions potentially contribute to all product ions in DOM: The patterns revealed through Δm matching were largely congruent with the independent estimate of structural composition by natural product databases. The fact that only some classes of compounds (mainly benzenoids and phenyl-propanoids) showed significant correlations could point to bias toward plant natural products in the databases employed here; in turn, this means that the inclusion of other structure databases and the additional assignment of Δm's not only to their host structures but also to host organisms (e.g., in GNPS 65 ) could reveal further clues about the potential sources of molecular formulas in DOM.
We propose that the number of Δm matches could be interpreted as a novel, relatively easily accessible measure to account for a precursor ions' underlying potential structural diversity. Such information could help to better understand the mechanisms of DOM formation and persistence in the environment. Our results encourage further studies on the Δm matching behavior of synthetic mixtures of known structures and across DOM chemotypes, and the improved bioinformatic exploitation of chimeric (LC-) FTMS n data of complex organic mixtures. 14,104−106 We acknowledge that natural product and in silico databases are far from being complete, same as the database of annotated Δm matches we used here, despite its large coverage of ∼18 000 unique structures and ∼11 500 Δm's ( Figure S-3). For example, precursor ions with low mass defects showed exceptionally few structural hits, indicating bias in natural product databases ( Figure S-13). 18 These structures were easily fragmented and yielded few Δm matches in our analysis. CHO precursor ions were double as likely to yield a suggestion than N-and Scontaining precursor ions. These observations show that DOM contains unique molecular structures to be identified in future, potentially through the application of a wider range of ionization and fragmentation techniques that reduce structural bias. 14,16,23

IMPLICATIONS
Tandem MS data of complex samples such as dissolved organic matter (DOM) is impeded by the co-fragmentation of precursor ions with similar nominal mass, and further complicated by the contribution of potential isomers and isobars. We employed an approach that analyzes the pairwise Δm's between all precursor and product ions (Δm matrix). Using a very limited set of precursor ion features from two samples, we found potential signs of structural imprints related to e.g., benzenoids, phenylpropanoids, carbohydrates, sulfonic acids, thiols, thioethers, and amino acids. The successful matching of indicative Δm features and precursor ion clustering suggests a recognizable source imprint of primary or recycled plant remains in DOM. Tests with more DOM samples and artificial/treated mixtures (e.g., spiked DOM, or enzyme-degraded DOM) are required to test the assumptions employed here and to improve classifications by Δm clustering. Our first results indicate that FTMS 2 data may be useful to differentiate molecular composition on the molecular formula level and that ion abundance and fragmentation sensitivity are two key variables that explain differences in MS 2 data within and among samples. This is intriguing because a shared molecular formula could harbor a completely different set of structures, and larger sets of DOM data would improve the detection of these differences. Generally, our findings support the view that regions of the van Krevelen plot are associated with indicative Δm's that relate to stoichiometric differences between compound classes. The most abundant precursor ions however showed a mixed MS 2 signal that caused boundary overlap of these "Δm-defined regions" (Figures 3 and 5e, f). While this finding is in line with known patterns of structural   (Figure 4); Table S- Complete contact information is available at: https://pubs.acs.org/10.1021/acs.est.2c01332

Author Contributions
C.S. performed the measurements. D.P., V.N.R., P.D., and G.G. were involved in planning and supervising the work. K.D. and S.B. compiled global Δm feature data, analyzed its specificity, and performed structural classifications of Δm host structures as well as structure suggestions of DOM precursor ions. C.S. processed the experimental data, performed the downstream analyses, drafted the manuscript, and designed the figures. The manuscript was revised through the contributions of all authors. All authors have approved the final version of the manuscript.